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Abstract 

Granular association rules reveal patterns hide in many-to-many relationships 
which are common in relational databases. In recommender systems, these rules 
are appropriate for cold start recommendation, where a customer or a product 
has just entered the system. An example of such rules might be "40% men like 
I— I at least 30% kinds of alcohol; 45% customers are men and 6% products are al- 

PQ cohol." Mining such rules is a challenging problem due to pattern explosion. In 

this paper, we propose a new type of parametric rough sets on two universes to 
^ study this problem. The model is deliberately defined such that the parameter 

^ corresponds to one threshold of rules. With the lower approximation operator 

'— ' in the new parametric rough sets, a backward algorithm is designed for the rule 

, mining problem. Experiments on two real world data sets show that the new al- 

^ gorithm is significantly faster than the existing sandwich algorithm. This study 

indicates a new application area, namely recommender systems, of relational 
\D data mining, granular computing and rough sets. 

O 

Keywords: Granular association rule, parametric rough sets, recommender 
system, cold start, relational data mining. 
^ 

\ [ 1. Introduction 

> 

• Recommender systems [H [H Hj suggest products of interest to customers, 
/\ therefore they have gained much success in E-commerce and similar applications. 

^ Three types of information are often employed in these systems [28]. The first 

is the basic information of customers and products. The second are the explicit 
and implicit preferences, including shopping history and product rating, of the 
current customer. The third are the preferences of other customers. There are 
two popular recommendation approaches. In content-based recommendation 
[3], the system recommends products similar to those a customer has liked. For 
example, if Ron has bought red wine, the system may recommend white wine 
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to him. In collaborative recommendation [51 [33], the system identifies other 
customers with similar tastes, and recommends products those customers have 
liked. For example, if Ron and Wang have quite similar shopping histories, and 
Wang has bought red wine however Ron has not, the system may recommend 
red wine to Ron. These recommender systems take advantage of all three types 
of information. However, for a new customer or a new product, the second type 
of information, namely the preferences of the current customer, does not exist. 
In this case, we are facing the cold start problem [l0l[28]. 

Granular association rules [22j[23] are appropriate for the cold start problem. 
To generate this type of rules, the first and the third types of information are 
represented as the many-to-many relationship, which is common in relational 
databases. In our scenario, the first type of information is represented by two 
entities customer and product, and the third type by a relation buys. Suppose 
that customers are described by gender, age, country, etc., and products by 
category, country, color, etc. Examples of granular association rules include 
"men like alcohol," "young men like France alcohol," and "Chinese men like 
blue stuff." Here both customers and products are described through a number 
of attributes, thus forming different granules [13 [32 [Ml [SS [SS HO]- These 
rules are undoubtedly applicable to both existing and new customers/products. 
Therefore they can be employed not only for the cold start problem, but also 
for general recommendation. 

There are four measures to evaluate the quality of a granular association 
rule. A complete example of granular association rules might be "40% men 
like at least 30% kinds of alcohol; 45% customers are men and 6% products 
are alcohol." Here 45%, 6%, 40%, and 30% are the source coverage, the target 
coverage, the source confidence, and the target confidence, respectively. With 
these four measures, the strongness of the rule is well defined. Therefore granular 
association rules are semantically richer than other relational association rules 
(see, e.g., [6l[l[i[l3|). 

A granular association rule mining problem is defined as finding all granular 
association rules given thresholds on four measures [22]. Similar to other rela- 
tional association rule mining problems (see, e.g., [H [5] [HI [3 [5] [IS] ) , this problem 
is challenging due to pattern explosion. A straightforward sandwich algorithm 
has been proposed in [22j. It starts from both entities and proceeds to the re- 
lation. Unfortunately, the time complexity is rather high and the performance 
is not satisfactory. 

In this paper, we propose a new type of parametric rough sets on two uni- 
verses to study the granular association rule mining problem. We borrow some 
ideas from variable precision rough sets [H] and rough sets on two universes 
[16l |20[ [30] to build the new model. The model is deliberately adjusted such 
that the parameter coincides with the target confidence threshold of rules. In 
this way, the parameter is semantic and can be specified by the user directly. 
We compare our definitions with alternative ones and point out that different 
definitions can be employed in different applications. Naturally, our definition 
is appropriate for cold start recommendation. We also study some properties, 
especially the monotonicity of the lower approximation, of our new model. 
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With the lower approximation of the proposed parametric rough sets, we 
design a backward algorithm for rule mining. This algorithm starts from the 
second universe and proceeds to the first one, hence it is called a backward 
algorithm. Compared with an existing sandwich algorithm |22| . the backward 
algorithm avoids some redundant computation. Consequently, it has a lower 
time complexity. 

Experiments are undertaken on two real world data sets. One is the course 
selection data from Zhangzhou Normal University during the semester between 
2011 and 2012. The other is the publicly available MovieLens data set [12]. 
Results show that 1) the backward algorithm is more than 2 times faster than 
the sandwich algorithm; 2) the run time is linear with respect to the data set 
size; and 3) sampling might be a good choice to decrease the run time. 

The rest of the paper is organized as follows. Section |2] reviews granular as- 
sociation rules through some examples. The rule mining problem is also defined. 
Section [3] presents a new model of parametric rough sets on two universes. The 
model is defined to cope with the formalization of granular association rules. 
Then Section [4] presents a backward algorithm for the problem. Experiments on 
the course selection data are discussed in Section [5] Finally, Section |6] presents 
some concluding remarks and further research directions. 

2. Granular association rules 

In this section, we revisit granular association rules We will discuss 

the data model, the definition, and four measures of such rules. A rule mining 
problem will also be presented. 

2.1. The data model 

The data model is based on information systems and binary relations. 

Definition 1.5'= (U, A) is an information system, where U = {xi,X2, ■ ■ ■ , a;„} 
is the set of all objects, A = {oi, 02, . . . , a„i} is the set of all attributes, and 
aj{xi) is the value of Xi on attribute aj for i £ [l..n\ and j € [l..m]. 

An example of information system is given by Table [ij where U = {cl, c2, 
c3, c4, c5}, and A = {Age, Gender, Married, Country, Income, NumCars}. 
Another example is given by Table [2] 

In an information system, any A' Q A induces an equivalent relation |261 129) 

Ea' = {{x, y)eU X C/|Va e A', a{x) = a{y)}, (1) 

and partitions U into a number of disjoint subsets called blocks. The block 
containing x d U is 

Ea' (a;) = {ye C/|Va e A', a{y) = a{x)). (2) 

From another viewpoint, a pair C — {A' ,x) where x E U and A' C A is called 
a concept. The extension of the concept is 

ET{C) = ET{A',x) ^ Ea'{x): (3) 
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Table 1: Customer 



CID 


Name Age 


Gender 


Married Country 


Income 


NumCars 


cl 


Ron 20.. 29 


Male 


No 


USA 


60k.. 69k 


0..1 


c2 


Michelle 20.. 29 


Female 


Yes 


USA 


80k.. 89k 


0..1 


c3 


Shun 20.. 29 


Male 


No 


China 


40k.. 49k 


0..1 


c4 


Yamago 30. .39 


Female 


Yes 


Japan 


80k.. 89k 


2 


c5 


Wang 30.. 39 


Male 


Yes 


China 


90k.. 99k 


2 


Table 2: Product 


PID 


Name 


Country 




Category 


Color 


Price 


Pl 


Bread 


Australia 




Staple 


Black 


1..9 


p2 


Diaper 


China 




Daily 


White 


1..9 


p3 


Pork 


China 




Meat 


Red 


1..9 


p4 


Beef 


Australia 




Meat 


Red 


10.. 19 


p5 


Beer 


France 




Alcohol 


Black 


10.. 19 


p6 


Wine 


France 




Alcohol 


White 


10.. 19 



while the intension of the concept is the conjunction of respective attribute- 
value pairs, i.e., 

IT{C)=IT{A',x)= /\ {a:a{x)). (4) 

aeA' 

The support of the concept is the size of its extension divided by the size of the 
universe, namely, 

support{C) = support{A' , x) = support{f\^^j^,{a : a{x))) 

= support{EA'{x)) = (5) 

_ \Ej,,(x)\ 

- \u\ ■ 

Definition 2. Let U = {xi, X2, . . . , x„} and V = {2/1,2/2, • • • , Vk} be two sets of 
objects. Any R C U x V is a, binary relation from U to V. The neighborhood 
of X G U is 

R{x) = {yeV\{x,y)eR}. (6) 

When U = V and R, is an equivalence relation, R{x) is the equivalence class 
containing x. From this definition we know immediately that for y £V, 

R-\y) = {xGU\{x,y)GR}. (7) 

A binary relation is more often stored in the database as a table with two 
foreign keys. In this way the storage is saved. For the convenience of illustration. 
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Table 3: Buys 



CID\ PID 


Pl 


p2 


p3 


p4 


p5 


p6 


cl 


1 


1 





1 


1 





c2 


1 








1 





1 


c3 





1 


1 





1 


1 


c4 





1 





1 


1 





c5 


1 





1 


1 


1 


1 



here we represented it with an n x fc boolean matrix. An example is given by 
Table [3j where U is the set of customers as indicated by Table [T] and V is the 
set of products as indicated by Table [2j 

With Definitions [1] and |2] we propose the following definition. 

Definition 3. A many-to-many entity-relationship system (MMER) is a 5- 
tuple ES = {U, A,V, B, R), where {U,A) and {V,B) are two information sys- 
tems, and R C U X V is a. binary relation from U to V. 

An example of MMER is given by Tables [ij [2] and [3| 

2.2. Granular association rules with four measures 

Now we come to the central definition of granular association rules. 

Definition 4. A granular association rule is an implication of the form 

{GR): /\ {a:a{x))^ /\ (6:%)), (8) 

aeA' beB' 

where A' C A and B' C B. 

According to Equation ([S]), the set of objects meeting the left-hand side of 
the granular association rule is 

LH{GR) = Ea'{x); (9) 

while the set of objects meeting the right-hand side of the granular association 
rule is 

RH{GR) = Es'iy). (10) 

From the MMER given by Tables [T] [2] and [3] we may obtain the following 
rule. 

(Rule 1) (Gender: Male) ^ (Category: Alcohol). 

Rule 1 can be read as "men like alcohol." There are some issues concerning 
the strongness of the rule. For example, we may ask the following questions on 
Rule 1: 

1. How many customers are men? 
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2. How many products arc alcohol? 

3. Do all men like alcohol? 

4. Do all kinds of alcohol favor men? 

An example of complete granular association rules with measures specified 

is "40% men like at least 30% kinds of alcohol; 45% customers arc men and 6% 
products are alcohol." Here 45%, 6%, 40%, and 30% are the source coverage, the 
target coverage, the source confidence, and the target confidence, respectively. 
These measures arc defined as follows. 

The source coverage of a granular association rule is 

scoverage{GR) = \LH{GR)\/\U\. (11) 

The target coverage of GR is 

tcoverage{GR) = \RH{GR)\/\V\. (12) 

There is a tradeoff between the source confidence and the target confidence 
of a rule. Consequently, neither value can be obtained directly from the rule. 
To compute any one of them, we should specify the threshold of the other. Let 
tc be the target confidence threshold. The source confidence of the rule is 

\{x e Lif(Gi?)|Mf^ > tc}\ 
sconfidence{GR,tc) = \LH{GR)\ " ^^^^ 

Let mc be the source confidence threshold, and 

\{x G LH{GR)\\R{x) n RH{GR)\ > + 1}| 
<mcx\LH{GR)\ (14) 
< \{xeLH{GR)\\R{x)f\RH{GR)\ > K}\. 

This equation means that mc x 100% elements in LH{GR) have connections 
with at least K elements in RH(GR), but less than mc x 100% elements in 
LH{GR) have connections with at least K+1 elements in RH{GR). The target 
confidence of the rule is 

tconfidence{GR,mc) = K/\RH{GR)\. (15) 

In fact, the computation of K is non-trivial. First, for any x e LH{GR), we need 
to compute tc{x) = \R{x) n RH{GR)\ and obtain an array of integers. Second, 
we sort the array in a descending order. Third, let k = [mc x \LH{GR)\\, K is 
the fc-th element in the array. 

The relationships between rules are interesting to us. As an example, let us 
consider the following rule: 
(Rule 2) (Gender: Male) A (Coimtry: China) 

^ (Category: Alcohol) A (Country: France) 

Rule 2 can be read as "Chinese men like France alcohol." One may say that 
we can infer Rule 2 from Rule 1 since the former one has a finer granule. How- 
ever, with the four measures we know that the relationships between these two 
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rules are not so simple. A detailed explanation of Rule 2 might be "60% Chinese 
men like at least 50% kinds of France alcohol; 15% customers are Chinese men 
and 2% products are France alcohol." Compared with Rule 1, Rule 2 is stronger 
in terms of source/target confidence, however weaker in terms of source/target 
coverage. Therefore if we need rules covering more people and products, we 
prefer Rule 1; if we need more confidence on the rules, we prefer Rule 2. For 
example, if the source confidence threshold is 55%, Rule 2 might be valid while 
Rule 1 is not; if the source coverage is 20%, Rule 1 might be valid while Rule 2 
is not. 

2.3. The granular association rule mining problem 
A straightforward rule mining problem is as follow. 

Problem 5. The granular association rule mining problem. 

Input: An ES — {U, A,V, B, R), a minimal source coverage threshold ms, 
a minimal target coverage threshold mt, a minimal source confidence threshold 
mc, and a minimal target confidence threshold tc. 

Output: All granular association rules satisfying scoverage{GR) > ms, 
tcoverage{GR) > mt, sconfidence{GR) > mc, and tconfidence{GR) > tc. 



Since both mc and tc are specified, we can choose either Equation ( 13 1 



or Equation (15 1 to decide whether or not a rule satisfies these thresholds. 



Equation ( 13 1 is a better choice. 



3. Parametric rough sets on two universes 

In this section, we first review rough approximations [26^ on one universe. 
Then we present rough approximations on two universes. Finally we present 
parametric rough approximations on two universes. Some concepts are dedi- 
cated to granular association rules. We will explain in detail the way they are 
defined from semantic point of view. 

3.1. Classical rough sets 

The classical rough sets [IHl 123 are built upon lower and upper approxima- 
tions on one universe. We adopt the notions employed in 131] and define these 
concepts as follows. 

Definition 6. Let J7 be a universe and i? C C/ x [/ be an indiscernibility rela- 
tion. The lower and upper approximations of X C_ U with respect to R are 

R{X) = {xe X\R(x) C X} (16) 

and 

R{X)^{xeX\R{x)nX ^d)}, (17) 

respectively. 



7 



These concepts can be employed for set approximation or classification anal- 
ysis. For set approximation, the interval [R{X),R{X)] is called rough set of X, 
which provides an approximate characterization of X by the objects that share 
the same description of its members [3T]. For classification analysis, the lower 
approximation operator helps finding certain rules; while the upper approxima- 
tion helps finding possible rules. 

3.2. Probabilistic rough sets on one universe 

Ziarko [H] pointed out that the classical rough sets cannot handle clas- 
sification with a controlled degree of uncertainty, or a misclassification error. 
Consequently, he proposed variable precision rough sets ^41] with a parameter 
to indicate the admissible classification error. For the convenience of discussion, 
we rewrite his definition as follows. 

Definition 7. Let [/ be a universe and i? C C/ x [/ be an indiscernibility rela- 
tion. The lower and upper approximations oi X (- U with respect to R under 
precision /3 are 

^\R{x)nx\ 



R^{X)^{xeU r]2[[;'^ >/3} (18) 



and 



respectively. 



MX) = {x e f/|^[||^ > 1 - /?}, (19) 



Note that 0.5 < (3 < 1 indicate the classification accuracy (precision) thresh- 
old rather than the misclassification error threshold as employed in [5T]. Wong 
et al. |30j extended the definition to an arbitrary binary relation which is at 
least serial. The new pair of approximations have the same forms as Definition 
[7] however the condition is changed. 

Yao and Wong ^36] also introduced another type of probabilistic rough sets 
in the framework of Bayesian decision theory. It is called the decision theoretic 
rough set (DTRS) model. It requires a pair of thresholds (a, /3) instead of only 
one in variable precision rough sets. Its main advantage is the solid foundation 
based on Bayesian decision theory. (a,/3) can be systematically computed by 
minimizing overall ternary classification cost Therefore this theory has 

drawn much research interests in both theory (see, e.g., [inillH]) and application 
(see, e.g., [H EH [37]). 

3.3. Rough sets on two universes for granular association rules 

Since our data model is concerned with two universes, we should consider 
computation models for this type of data. Rough sets on two universes have 
been defined in ^30) . Some later works adopt the same definitions (see, e.g., 
[inilinj)- We will present our definitions which cope with granular association 
rules. Then we discuss why they are different from existing ones. 



8 



Definition 8. Let U and V be two universes, R C U xV he a, binary relation. 
Tlie lower and upper approximations of X C U with respect to R are 

R{X) = {ye V\R-\y) D X} (20) 

and 

R{X) = {yeV\R-\y)nX^(l}}, (21) 

respectively. 

From this definition we know immediately that for Y CV, 

R-^(Y) = {xe U\R{x) D Y}, (22) 



R-^{Y) ^ {x eU\R{x)nY ^$}. (23) 

Now we explain these notions through our example. R{X) contains products 
that favor all people in X, Rr]_{Y) contains people who like all products in 
Y, R{X) contains products that favor at least one person in X, and R^^{Y) 
contains people who like at least one product in Y. 

We have the following property concerning the monotonicity of these ap- 
proximations. 

Property 9. Let X^dXi- 

R{Xi)^R{X2). (24) 

R{X,)CR{X2). (25) 

That is, with the increase of the object subset, the lower approximation 
decreases while the upper approximation increases. It is somehow ad hoc to 
people in the rough set society that the lower approximation decreases in this 



case. In fact, according to Wong et al. ^ and Liu [2D1, Equation (20) should 
be rewritten as 

g{X) = {yeV\R-\y)CX}, (26) 
where the prim is employed to distinguish between two definitions. In this way. 



R'{Xi) C R!{X2). Moreover, it would coincide with Equation ( 16 ) when U = V. 

We argue that different definitions of the lower approximation are appropri- 
ate for different applications. Suppose that there is a clinic system where U is 
the set of all symptoms and V is the set of all diseases [30] . X C [/ is a set of 



symptoms. According to Equation (26), R!{X) contains diseases that induced 



by symptoms only in X. That is, if a patient has no symptom in X , she never 
has any diseases in R'{X). This type of rules are natural and useful. 

In our example presented in Section [2| X C ?7 is a group of people. R'{X) 
contains products that favor people only in X. That is, if a person does not 
belong to X, she never likes products in R!{X). Unfortunately, this type of 



rules are not interesting to us. This is why we employ Equation (20) for lower 
approximation. 
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We are looking for very strong rules through the lower approximation indi- 
cated in Definition [8] For example, "all men like all alcohol." This kind of rules 
are called complete match rules |23j . However, they seldom exist in applications. 
On the other hand, we are looking for very weak rules through the upper ap- 
proximation. For example, "at least one man like at least one kind of alcohol." 
Another extreme example is "all people like at least one kind of product," which 
hold for any data set. Therefore this kind of rules are useless. These issues will 
be address through a more general model in the next subsection. 

3.4- Parametric rough sets on two universes for granular association rules 

Given a group of people, the number of products that favor all of them is 
often quite small. On the other hand, the number of products that favor at 
least one of them is not quite meaningful. Similar to probabilistic rough sets, 
we need to introduce one or more parameters to the model. 

To cope with the source confidence measure introduced in the Section |2.2[ 
we propose the following definition. 

Definition 10. Let U and V be two universes, R C U x V he a binary relation, 
and < /3 < 1 be a user-specified threshold. The lower approximation oi X C U 
with respect to R for threshold /3 is 

n.m^iyeVll^^^>ff}. (27) 

We do not discuss the upper approximation in the new context due to lack of 
semantic. 

From this definition we know immediately that the lower approximation of 
Y C_V with respect to R is 

irl,{Y) = {xeu\^-^^^p^>f3}. (28) 

Here /3 corresponds with the target confidence instead. In our example, Rp{X) 
are products that favor at least /3 x 100% people in X, and R^^ (Y) are people 
who like at least /3 x 100% products in Y. 

The following property indicates that RpiX) is a generalization of both 
R{X) and R{X). 

Property 11. Let U and V be two universes, R Q U x V he a binary relation, 

R,{X)=R{X), (29) 

R,{X)=^R{X), (30) 
where e ~ is a small positive number. 

Proof, '^''g}^^' > 1 ^ {R-^y) n X\ = \X\ ^ R-^y) D X. 
i^'i^l^^i >e^\R-\y)nx\ >o^i?-i(?y)nx^0. 
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The fohowing property shows the monotonicity of Rp{X). 

Property 12. Le< < /3i < /?2 < 1- 

Rp,{X)CR^^{X). (31) 

However, given Xi C X2, we obtain neither R/^iXi) C Rp{X2) nor Rp{Xi) D 
Rp{X2). The relationships between Rp{Xi) and Rp{X2) depend on (3. Gen- 
erally, if (3 is big, tends to be bigger, otherwise Rp{Xi) tends to be 



smaller. Equation (24) indicates the extreme case for /? ~ 0, and Equation (251 
indicates the other extreme case for /? = 1. 

/3 is the coverage of R{x) (or R~^{y)) to Y (or X). It does not mean 
precision of an approximation. This is why we call this model parametric rough 
sets instead of variable precision rough sets [41 or probabilistic rough sets j35j . 

Similar to the discussion in Section [3.3[ in some cases we would like to employ 
the following definition. 

g,iX)^{yeV\\^g^>P}. (32) 

It coincides with i?^(X) defined in Definition [t] if U — V. Take the clinic system 
again as the example. R!_p{X) is the set of diseases that are caused mainly (with 
a probability no less than (3 x 100%) by symptoms in X. 



4. A backward algorithm to granular association rule mining 

In our previous work , we have proposed an algorithm according to Equa- 



tion (13). The algorithm starts from both sides and checks the validity of all 
candidate rules. Therefore it was named a sandwich algorithm. 

To make use of the concept proposed in the Section [3] we should rewrite 



Equation ( 13 ) as follows 



sconfidence{GR, tc) 



\LH(GR)\ 

_ |{^gC/| '"'ri^^cAfi''" >tc}nLH(GR)\ (33) 

\LH(GR)\ 
_ \R-\^{RH(GR))nLH(GR)\ 
~ \LH{GR)\ 

With this equation, we propose an algorithm to deal with Problem [5] The 
algorithm is listed in Algorithm [T] It essentially has four steps. 

Step 1. Build all concepts meeting the minimal source coverage thresh- 
old ms from {U,A). These concepts are candidates for LH{GR). This step 
corresponds to Line 1 of the algorithm, where SC stands for source concept. 

Step 2. Build all concepts meeting the minimal target coverage threshold mt 
from (y, B). These concepts are candidates for RH{GR). This step corresponds 
to Line 2 of the algorithm, where TC stands for target concept. 

Step 3. Pick up a concept from TC{ms) and compute its lower approxi- 
mation with parameter tc. This step corresponds to Lines 3 through 5 of the 
algorithm. 
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Algorithm 1 The backward algorithm 
Input: ES = {U, A, V, B, R), ms, mt, mc, tc. 

Output: All granular association rules satisfying given thresholds. 
Method: backward 

1: SCims) = e 2-^ X U\ '^l^i"'-" > ms}\ //Candidate source concepts 

2: TC{mt) = {{B',y) G 2^ x V^l^^^^^ > mt}] //Candidate target concepts 

3: for each C" e TC{mt) do 

4: Y = ET{C'); 

5: X = R-^t,{Y); 

6: for each C S SC{ms) do 

7: if (I A n ET{C)\/\ET{C)\ > mc) then 

8: output rule /T(C) /T(C"); 

9: end if 

10: end for 

11: end for 



Step 4. Pick up a concept from SC{mt) and build a candidate rule. At the 
same time, check the validity the rule with threshold mc. This step corresponds 
to Lines 6 through 10 of the algorithm. Although all candidates are checked, only 
these valid rules are explicitly generated. The rule is GR : IT{C) ^ IT{C'). 
Therefore LH{GR) = ET{C), RH{GR) = ET{C'), and 

sconfidence{GR,tc) ^ \X n ET{C)\/\ET{C)\. (34) 

Because the algorithm starts from the right-hand side of the rule and pro- 
ceeds to the left-hand side, it is called a backward algorithm. It is necessary to 
compare the time complexities of the existing sandwich algorithm and our new 
backward algorithm. Both algorithms share Steps 1 and 2, which do not incur 
the pattern explosion problem. Therefore we will focus on the remaining steps. 
The time complexity of the sandwich algorithm is [35] 

0{\SC{ms)\ X \TC{mt)\ x \U\ x \V\)), (35) 

where | • | denotes the cardinality of a set. 

The computation of lower approximation in Line 5 of our backward algorithm 
takes 0{U x V) of time. The computation of the if statement in Line 6 takes 
0{U) of time. Hence the time complexity of the backward algorithm is 

Oi\SC{ms)\ X \U\ X (|rC(mt)| + \V\)), (36) 

which is lower than the sandwich algorithm. 

Intuitively, the backward algorithm avoids computing R{x) D RH{GR) for 
different rules with the same right hand side. Hence it should be less time 
consuming than the sandwich algorithm. We will compare the run time of these 
algorithms in the next section through experimentation. 
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The space complexities of these two algorithms are also important. To store 
the relation R, a \U\ x \V\ boolean matrix is needed. 

5. Experiments on two real world data sets 

The main purpose of our experiments is to answer the following questions. 

1. Does the backward algorithm outperform the sandwich algorithm? 

2. How does the number of rules change for different number of objects? 

3. How does the algorithm run time change for different number of objects? 

4. How does the number of rules change for different thresholds? 

5.1. Data sets 

We collected two real world data sets for experimentation. One is course 
selection, and the other is movie rating. These data sets are quite representative 
for applications. 

5.1.1. A course selection data set 

The course selection system often serves as an example in textbooks to ex- 
plain the concept of many-to-many entity-relationship diagrams. Hence it is 
appropriate to produce meaningful granular association rules and test the per- 
formance of our algorithm. We obtained a data set from the course selection 
system of Zhangzhou Normal Universitjj^ Specifically, we collected data during 
the semester between 2011 and 2012. There are 145 general education courses 
in the university. 9,654 students took part in course selection. The database 
schema is as follows. 

• Student ( studentID , name, gender, birth-year, politics-status, grade, de- 
partment, nationality, length-of-schooling) 

• Course (courselD, credit, class-hours, availability, department) 

• Selects (studentID, courselD) 

Our algorithm supports only nominal data at this time. For this data set, 
all data are viewed nominal directly. In this way no discretization approach 
is employed to convert numeric ones into nominal ones. Also we removed stu- 
dent names and course names from the original data since they are useless in 
generating meaningful rules. 

5.1.2. A movie rating data set 

The MovieLens data set assembled by the GroupLens projeci[^is widely used 
in recommender systems (see, e.g., OHH])- We downloaded the data set from 
the Internet Movie Databas^ Originally, the data set contains 100,000 ratings 



^We would like to thank Mrs. Chunmei Zhou for her help in the data collection, 
■^http:/ /www. grouplens.org 
•^http:/ /movielens. umn.edu 
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(1-5) from 943 users on 1,682 movies, with each user rating at least 20 movies 
[25] . Currently, the available data set contains 1,000,209 anonymous ratings of 
3,952 movies made by 6,040 MovieLens users who joined MovieLens in 2000. 
Due to the data set reading hmitation of Weka [TT] and Coser [5S] , we use the 
first 3,800 users and all movies. In order to run our algorithm, we preprocessed 
the data set as follows. 

1. Remove movie names. They arc not useful in generating meaningful gran- 
ular association rules. 

2. Use release year instead of release date. In this way the granule is more 
reasonable. 

3. Select the movie genre. In the original data, the movie genre is multi- 
valued since one movie may fall in more than one genre. For example, 
a movie can be both Animation and Children's. Unfortunately, granular 
association rules do not support this type of data at this time. Since the 
main objective of this work is to compare the performances of algorithms, 
we use a simple approach to deal with this issue. That is to sort movie 
genres according to the number of users they attract, and only keep the 
highest priority genre for the current movie. We adopt the following pri- 
ority (from high to low): Comedy, Action, Thriller, Romance, Adventure, 
Children, Crime, Sci-Fi, Horror, War, Mystery, Musical, Documentary, 
Animation, Western, FilmNoir, Fantasy, Unkown. 

Our database schema is as follows. 

• User ( userlD , age, gender, occupation) 

• Movie ( movielD , releaseYear, genre) 

• Rates (userlD, movielD) 

There are 8 user age intervals, 21 occupations and 71 release years. Similar to 
the course selection data set, all these data are viewed nominal and processed 
directly. We employ neither discretization nor symbolic value partition |24) 
approaches to produce coarser granules. 

5.2. Results 

We undertake four sets of experiments to answer the questions proposed at 
the beginning of this section. 

5.2.1. Efficiency comparison 

We compare the efficiencies of the backward and the sandwich algorithms. 
We look at only the run time of Lines 3 through 11, since these codes are the 
difference between two algorithms. 

For the course selection data set, when ms — nit — 0.06, mc = 0.18, tc = 
0.11, we obtain only 40 rules. For higher thresholds, no rule can be obtained. 
Therefore we use the following settings, mc = 0.18, tc = 0.11, ms = mt, and 
ms e {0.02, 0.03, 0.04, 0.05, 0.06}. Figure[T|a) shows the actual run time in mini- 
seconds. Figure l2[a) shows the number of basic operations, including addition. 
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Figure 1: Run time information: (a) course selection, (b) MovieLens (3,800 users). 



comparison, etc. of numbers. Here we observe that for different settings, the 
backward algorithm is more than 2 times faster than the sandwich algorithm, 
it only takes less than 1/3 operations than the sandwich algorithm. 

For the MovieLens data set, we employ the data set with 3,800 users and 
3,952 movies. We use the following settings, mc = 0.15, tc = 0.14, ms = mt, 
and ms e {0.01,0.02,0.03,0.04,0.05}. Figure [^b) shows the actual run time 
in mini-seconds. Figure [2jb) shows the number of basic operations, including 
addition, comparison, etc. of numbers. Here we observe that for different set- 
tings, the backward algorithm is more than 3 times faster than the sandwich 
algorithm, it only takes less than 1/4 operations than the sandwich algorithm. 

5.2.2. Change of number of rules for different data set sizes 

Now we study how the number of rules changes with the increase of the 
data set size. The experiments are undertaken only on the MovieLens data set. 
We use the following settings, mc = 0.15, tc = 0.14, ms € {0.01,0.02}, and 
\U\ e {500, 1,000, 1,500, 2,000, 2,500, 3,000, 3,500}. The number of movies is 
always 3,952. While selecting k users, we always select from the first user to the 
/c-th user. 

First we look at the number of concepts satisfying the source confidence 
threshold ms. According to Figure [3|^a), the number of source concepts de- 
creases with the increase of the number of users. However, Figure [3|^b) indi- 
cates that this trend may not hold. In fact, from Figure [Sj the most important 
observation is that the number of source concepts does not vary much with the 
change of the number of objects. When the number of users is more than 1,500, 
this variation is no more than 3, which is less than 5% of the total number of 
concepts. 

Second we look at the number of granular association rules satisfying all four 
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Figure 2: Basic operations information: (a) course selection, (b) MovieLcns (3,800 users). 



thresholds. Figure |4] indicates that the number of rules varies more than the 
number of source concepts. However, this variation is less than 20% when there 
are more than 1,500 users, or less than 10% when there are more than 2,500 
users. 

5.2.3. Change of run time for different data set sizes 

We look at the run time change with the increase of the number of users. The 



time complexity of the algorithm is given by Equation ( 36 1 . Since the number 
of movies is not changed, \TC{mt)\ + \V\ is fixed. Moreover, according to our 
earlier discussion, \SC{ms)\ does not vary much for different number of users. 
Therefore the time complexity is nearly linear with respect to the number of 
users. Figure [5] validates this analysis. 

5.2.4- Number of rules for different thresholds 

Figure [HJa) shows the number of rule decreases dramatically with the in- 
crease of ms and mt. For the course selection data set, the number of rules 
would be when ms — mt = 0.07. For the MovieLens data set, the number of 
rules would be when ms = mt = 0.06. 

5.3. Discussions 

Now we can answer the questions proposed at the beginning of this section. 

1. The backward algorithm outperforms the sandwich algorithm. The back- 
ward algorithm is more than 2 times and 3 times faster than the sandwich 
algorithm on the course selection and MovieLens data sets, respectively. 
Therefore our parametric rough sets on two universes are useful in appli- 
cations. 
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Figure 3: Number of concepts on users for MovieLens: (a) ms = 0.01, (b) ms = 0.02. 



2. The number of rules does not change much for different number of objects. 
Therefore it is not necessary to collect too many data to obtain meaningful 
granular association rules. For example, for the MovieLens data set, 3,000 
users are pretty enough. 

3. The run time is nearly linear with respect to the number of objects. There- 
fore the algorithm is scalable from the viewpoint of time complexity. How- 
ever, we observe that the relation table might be rather big, therefore this 
would be a bottleneck of the algorithm. 

4. The number of rules decreases dramatically with the increase of thresholds 
ms and mt. It is important to specify appropriate thresholds to obtain 
useful rules. 



6. Conclusions and further works 

In this paper, we have proposed a new type of parametric rough sets on two 
universes to deal with the granular association rule mining problem. The lower 
approximation operator is defined, and its monotonicity is analyzed. With the 
help of the new model, a backward algorithm to the granular association rule 
mining problem is proposed. The new algorithm is significantly faster than the 
existing sandwich algorithm 

The following research topics deserve further investigation: 

1. Multi- valued data. The MovieLens data set contains the multi- valued 
attribute genre. In this work we employed a simple approach to deal with 
it. This is partly because that the focus of this work is on the algorithm 
efficiency. However, more work should be undertaken to fully explore this 
type of data. 
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Figure 4: Number of granular association rules for MovieLens: (a) ms = 0.01, (b) ms = 0.02. 

2. Symbolic value partition. For some attributes, there may exist many 
attribute values. For example, the occupation attribute of users in the 
MovieLens data set contains 18 values. To obtain an more appropriate 
granule, we may need to preprocess them through symbolic value partition 

m- 

3. Rule testing and validating. This work only focus on how to obtain rules 
with user specified thresholds. However, we should test the usefulness 
of these rules through the training and testing scenario. We should also 
compare the performance of our approach with other cold start recom- 
mendation approaches. 

To sum up, this work applies rough set theory to recommender systems. We 
can apply the new model to similar problems [21] where approximations are 
fundamental. Therefore this work is one step toward the application of rough 
set theory and granular computing. 
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